Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gravity model with PPML*

    Dear all.

    Firstly I would like to say I am a new Stata User.

    I am trying to run a gravity model with ppml for 133 countries from 2005 to 2018. It has some some variables like Trade(X) Distance(distw) Population(POP) Gross Domestic Product(GDP) Contiguity(contig) Common language(comlang_off) Colony(col45) GATT member (GATT) Free trade Agreement signed (fta_wto). Likewise I want to include Time Invariant fixed effects for importing (imp_TIFE) and exporting(exp_TIFE) countries and also I want to include Time fixed effects (TFE).To do it, I created the effects as variables and and they are in the equation as regressors.

    I am running the following comand in Stata but I am not sure if I am doing it correctly.

    ppml x log_distw log_pop_d log_pop_o log_gdp_d log_gdp_o contig comlang_off col45 gatt_o gatt_d fta_wto exp_TIFE imp_TIFE TFE

    The results are:

    Click image for larger version

Name:	aa.png
Views:	1
Size:	20.2 KB
ID:	1506700


    The first thing I see is a R2 very small (0.19) and it worry me. I have compared this results with other studies and I can say POP for exporting and importing countries and col45 are completely different, Distance and GDP is quite under. Other variables have similar results.

    I would like to receive your advice in order to know if the results I got are in the normal ranges or not and specially if I am using the command correctly.

    Anything you could recommend me would be so useful to me.

    Kind regards.

    Carlos Abreo.



    Attached Files

  • #2
    Hello again.
    I forgot to say that Professor Santos suggested me to use ppmlhdfe in a recent post I did, but I am not sure how to use it with commands and effects.
    Best wishes.

    Comment


    • #3
      Hi Carlos,
      It looks like the specification of the fixed effects is not being coded right. You have each of them as a single variable but technically you should have a dummy variable for each fixed effect category (i.e., three sets of dummies, with unique dummies for each exporter, importer, and year).

      With ppmlhdfe, it is fortunately easy to emulate these dummies. Assuming exp_TIFE, imp_TIFE, TFE are coded numerically, you can type

      ppml x log_distw log_pop_d log_pop_o log_gdp_d log_gdp_o contig comlang_off col45 gatt_o gatt_d fta_wto, absorb(exp_TIFE imp_TIFE TFE)

      And this will specify the fixed effects properly for you. However, that said, I suggest first familiarizing yourself with how to estimate the fixed effects the hard way. Try the following:

      xi i.exp_TIFE i.imp_TIFE i.TFE

      and notice how this creates three sets of dummy variables corresponding to exporter, importer, and year (open up your data window and browse through them.) These dummies should be respectively indexed as _Iexp_TIFE*, _IimpTIFE* and _ITFE* Now estimate the following:

      ppml x log_distw log_pop_d log_pop_o log_gdp_d log_gdp_o contig comlang_off col45 gatt_o gatt_d fta_wto _Iexp_TIFE* _IimpTIFE* _ITFE*

      Now that your regression includes all these dummies separately, it will have many more covariates and will absorb much more information. Your R^2 should be much higher in this case (you were right to flag that as a concern.)

      Hope this is helpful!

      Regards,
      Tom

      Comment


      • #4
        Hello Tom.

        Thanks for your response.

        As you said I created the effects as a numerical and single variable, then now I know that is not the proper way to code it.

        I typed:
        ppml x log_distw log_pop_d log_pop_o log_gdp_d log_gdp_o contig comlang_off col45 gatt_o gatt_d fta_wto, absorb(exp_TIFE imp_TIFE TFE)

        But Stata said:
        option absorb() not allowed
        r(198);

        Then I typed:
        xi i.exp_TIFE i.imp_TIFE i.TFE
        And Stata created the sets of dummy variables for each effect I want to include. That was a quite good advise because I would be so helpful to me.

        Then I typed:
        ppml x log_distw log_pop_d log_pop_o log_gdp_d log_gdp_o contig comlang_off col45 gatt_o gatt_d fta_wto _Iexp_TIFE* _IimpTIFE* _ITFE*

        But Stata said:
        variable _IimpTIFE* not found
        r(111);

        I am attaching you a image where you can see that variables have been created.

        Click image for larger version

Name:	zz.png
Views:	1
Size:	93.1 KB
ID:	1506798


        By last I want to know why the new sets of dummy variables started from TIFE_2 instead of TIFE_1?

        I hope you can help me.

        Best wishes.

        Carlos.

        Comment


        • #5
          Hi Carlos,
          Sorry I was not very careful in my last reply. The first line of code I typed should be

          ppmlhdfe x log_distw log_pop_d log_pop_o log_gdp_d log_gdp_o contig comlang_off col45 gatt_o gatt_d fta_wto, absorb(exp_TIFE imp_TIFE TFE)

          such that it uses a different command called ppmlhdfe (which you will need to install from ssc).

          The reason why the created fixed effect dummies start at 2 is because they are naturally collinear with each other (and with the assumed constant in the model). Thus, all fixed effects estimates must be interpreted relative to the omitted category (in this case 1). Note this is similar to how dummy variable estimates are interpreted in general, i.e., dummies for "high school grad" and "college grad" can only tell you the effects of these variables relative to being neither a high school grad nor a college grad.

          Sorry again about the confusion; hope this clarifies.

          Regards,
          Tom

          Comment


          • #6
            Hi Tom.

            I typed the line:

            ppmlhdfe x log_distw log_pop_d log_pop_o log_gdp_d log_gdp_o contig comlang_off col45 gatt_o gatt_d fta_wto, absorb(exp_TIFE imp_TIFE TFE)


            And it was the results I got:

            Click image for larger version

Name:	bb.png
Views:	1
Size:	24.2 KB
ID:	1506864



            I can see the R2 is quite bigger than before and Distance results are more consistent with result of other studies. But in the case of GDP and POP results (two of the most important variables in Gravity Models) are a lot lower than the results of other studies. I would like to know if I am doing something wrong.

            On the other hand, I would like to know how could be the command to estimate the same with PPML instead of PPMLHDFE. I am asking you that because I do not really know the difference between PPML and PPMLHDFE, what little I know is that PPMLHDFE manage the Fixed Effects better than PPML.

            Thanks again for your help.

            Regards.

            Carlos.

            Comment


            • #7
              Hi Carlos,

              The reason your estimates for GDP may be different is because you also include the log of each country's population (the two are likely very correlated.) In any case, GDPs are usually treated as only a control; I would not worry about whether the coefficients are 1 or not.

              To estimate the same model using ppml, I believe you should be able to input:

              xi i.exp_TIFE i.imp_TIFE i.TFE
              ppml x log_distw log_pop_d log_pop_o log_gdp_d log_gdp_o contig comlang_off col45 gatt_o gatt_d fta_wto _Iexp_TIFE* _IimpTIFE* _ITFE*


              This exercise should also help to clarify the difference between the two commands: ppml doesn't treat fixed effects any differently than other regressors, whereas ppmlhdfe uses a special algorithm that avoids creating a new column for each dummy variable that would be needed. This makes ppmlhdfe much faster for problems that involve fixed effects and similar regressors.

              Regards,
              Tom

              Comment


              • #8
                Hello Tom.

                About the low results of GDP variables you said is because I log POP variables. What can I do in order to get similar results of other studies (In some studies results are between 0.7 and 0.8). Should not I include variable POP as a log? Or should I do something different?

                Then I typed the command:

                ppml x log_distw log_pop_d log_pop_o log_gdp_d log_gdp_o contig comlang_off col45 gatt_o gatt_d fta_wto _Iexp_TIFE* _IimpTIFE* _ITFE*

                And Stata says:
                variable _IimpTIFE* not found
                r(111);

                And about the difference between PPML and PPMLHDFE thanks to clarify me the doubt.

                Thanks again for your help.

                Regards.

                Carlos.

                Comment


                • #9
                  Hi Carlos,

                  To be clear, it's not about whether you log population or not. It's that population is sometimes not included at all. Also, as I said before, I would not place much emphasis at all on whether the coefficient on log GDP is close to 1 or not.

                  For the second part, it looks like you still need to create the dummies using xi. I hope this all makes sense by the way - the dummies you create using the xi command are what allow you to account for the fixed effects when you use ppml. This is essentially what fixed effects estimation is: the use of category-specific dummy variables to account for any variation that is specific to those categories. Here is an explainer that might be helpful: https://are.berkeley.edu/courses/EEP...ed_effects.pdf

                  Regards,
                  Tom

                  Comment


                  • #10
                    Hello Tom.

                    Thanks again for your response.

                    Actually I created the dummy variables with the command you typed:

                    xi i.exp_TIFE i.imp_TIFE i.TFE

                    Click image for larger version

Name:	ss.png
Views:	1
Size:	98.5 KB
ID:	1506887


                    But when I typed the next command you typed Stata said:
                    variable _IimpTIFE* not found
                    r(111);

                    Nevertheless I am going to follow your read suggestion in order to better understand how to create dummy variables with Stata.

                    Kind regards.

                    Carlos.

                    Comment


                    • #11
                      Hi Carlos,

                      Again I was not careful enough: note that the dummy variables that have been saved in your data set have an additional "_" (i.e., _Iexp_TIFE* instead of _IexpTIFE*). That should fix the error.

                      Regards,
                      Tom

                      Comment


                      • #12
                        Hello Tom.

                        Now It worked very well. The results were exactly the same with the PPMLHDFE command.

                        Again, thanks for your help, suggestions and above all for your time..

                        Kind regards.

                        Carlos.

                        Comment


                        • #13
                          Glad to hear it.

                          Comment


                          • #14
                            Dear all,
                            I'm trying to estimate a gravity model of migration using ppmlhdfe as follows:

                            ppmlhdfe x l.lgdppco l.lturo l.lpopo l.lgdppcd l.lturd l.lpopd , absorb(year From To)

                            where x=people migrating from origin i to destination j, lgdppco=(log of) per capita GDP at origin; lturo=(log of) unemployment rate at origin; lpopo=(log of) population at origin; lgdppcd=(log of) per capita GDP at destination; lturd=(log of) unemployment rate at destination; lpopd=(log of) population at destination. Moreover year, From and To are (numeric) dummy variables for time, origin and destination. After running the command the following error message appears:

                            remove_collinears(): 3499 selectindex() not found
                            GLM::init_variables(): - function returned error
                            <istmt>: - function returned error

                            Could anyone explain what is going wrong?

                            Grateful for any help with this.

                            Best regards,
                            Romano

                            Comment


                            • #15
                              Hi Romano,
                              What version of Stata are you using? The selectindex() mata function may only be backwards compatible going back to Stata 12 or so.
                              Regards,
                              Tom

                              Comment

                              Working...
                              X